We're seeing some cagey results from our so-called #de-pr-roulette channel. It necessitates diving in to see if this part of the internet is wandering haplessly through the bitfield, confused and entropic.
import statistics as stat
import random
import plotly.express as px
people = [
'l.huang','j.stuckey','Yash','Amber','Ravali','Doreen','Brad',
'Bhanu','Andrew','dayang','Phil','Ben','Ryan','Bree','Jacob',
'russell','Mike'
]
actual = [13, 12, 10, 10, 9, 8, 8, 7, 7, 7, 6, 5, 4, 3, 2, 2, 2]
actual_variance = stat.variance(actual)
actual_variance
11.816176470588236
N_SIMULATIONS = 10000
N_SELECTIONS = 115
results = {i:{'selections':{k:0 for k in people},'variance':None} for i in range(N_SIMULATIONS)}
for k in range (N_SIMULATIONS):
for i in range(N_SELECTIONS):
this = random.choice(people)
results[k]['selections'][this] += 1
for x in results:
results[x]['variance'] = stat.variance(results[x]['selections'].values())
variances = [x['variance'] for x in results.values()]
We have simulated enough outcomes to know the population distribution.
Let's calculate the population mean and stdev and visualize the distribution to understand it better.
pop_variance_mean = stat.mean([x['variance'] for x in results.values()])
pop_variance_stdev = stat.stdev([x['variance'] for x in results.values()])
pop_variance_mean
6.780713970588235
pop_variance_stdev
2.4113048413042923
px.histogram(variances,title='Distribution of sample variances',nbins=50)
While it seems unlikely that one person would be selected so many times given a random distribution (leading to a sample variance of 11+), when we look at the distribution of variances it appears there are at least a few that have this high of variances.
probability_of_variance = len([x for x in variances if x >= actual_variance])/len(variances)
probability_of_variance
0.0346
So there's a small nonzero chance that the distribution would look like this.